iT邦幫忙

2024 iThome 鐵人賽

DAY 28
0

題目

Questions

Q30

A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information. The data engineer must identify and remove duplicate information from the legacy application data. Which solution will meet these requirements with the LEAST operational overhead?

  • [ ] A. Write a custom extract, transform, and load (ETL) job in Python. Use the DataFrame.drop_duplicates() function by importing the Pandas library to perform data deduplication.
  • [x] B. Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine learning (ML) transform to transform the data to perform data deduplication.
  • [ ] C. Write a custom extract, transform, and load (ETL) job in Python. Import the Python dedupe library. Use the dedupe library to perform data deduplication.
  • [ ] D. Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

描述

  • 一公司搬遷老應用,在 S3 建資料湖
  • 裡面有些資料是重複的,需要被移除
  • 選最低維運成本的

解析

  • 選維運成本低的啊,自建 Python 環境怎麼比得上 Glue ? 別選 A C 了
  • 關於 FindMatches 就看一下文件理解一下他可以幹嘛

Q31

A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3. Which actions will provide the FASTEST queries? (Choose two.)

  • [ ] A. Use gzip compression to compress individual files to sizes that are between 1 GB and 5 GB.
  • [x] B. Use a columnar storage file format.
  • [x] C. Partition the data based on the most common query predicates.
  • [ ] D. Split the data into files that are less than 10 KB.
  • [ ] E. Use file formats that are not splittable.

描述

  • 有一公司建立分析解決方案,用 S3 做資料湖, Redshift 做資料倉儲
  • Amazon Redshift Spectrum 查詢 S3 裡面的資料
  • 哪個做法可以查詢最快

解析

Q32

A company uses Amazon RDS to store transactional data. The company runs an RDS DB instance in a private subnet. A developer wrote an AWS Lambda function with default settings to insert, update, or delete data in the DB instance. The developer needs to give the Lambda function the ability to connect to the DB instance privately without using the public internet. Which combination of steps will meet this requirement with the LEAST operational overhead? (Choose two.)

  • [ ] A. Turn on the public access setting for the DB instance.
  • [ ] B. Update the security group of the DB instance to allow only Lambda function invocations on the database port.
  • [x] C. Configure the Lambda function to run in the same subnet that the DB instance uses.
  • [x] D. Attach the same security group to the Lambda function and the DB instance. Include a self-referencing rule that allows access through the database port.
  • [ ] E. Update the network ACL of the private subnet to include a self-referencing rule that allows access through the database port.

描述

  • 一公司用 RDS 存交易資料
  • 公司 RDS 資料庫放在 private subnet
  • 工程師用 Lambda function 去 新增更新刪除 RDS 資料
  • 要給 Lambda 權限,避免透過網際網路存取
  • 哪個維運成本低?
  • CD or BC

解析

  • A 直接改網路變成網際網路,跟要求相違背,不選
  • B Lambda 套和 DB 不同的,變成要回戶兩組 security group
  • D 只要維護一個

上一篇
【Day 27】 做題庫小試身手 - 8
下一篇
【Day 29】 做題庫小試身手 - 10
系列文
老闆,外帶一份 AWS Certified Data Engineer30
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言